Cross-Lingual Sense Determination: Can It Work?

نویسنده

  • Nancy Ide
چکیده

This article reports the results of a preliminary analysis of translation equivalents in four languages from different language families, extracted from an on-line parallel corpus of George Orwell’s Nineteen Eighty-Four The goal of the study is to determine the degree to which translation equivalents for different meanings of a polysemous word in English are lexicalized differently across a variety of languages, and to determine whether this information can be used to structure or create a set of sense distinctions useful in natural language processing applications. A coherence index is computed that measures the tendency for different senses of the same English word to be lexicalized differently, and from this data a clustering algorithm is used to create sense hierarchies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Representation with Statistical Word Senses in Cross-Lingual Document Clustering

Cross-lingual document clustering is the task of automatically organizing a large collection of multi-lingual documents into a few clusters, depending on their content or topic. It is well known that language barrier and translation ambiguity are two challenging issues for cross-lingual document representation. To this end, we propose to represent cross-lingual documents through statistical wor...

متن کامل

UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System

This paper describes the Cross-Lingual Word Sense Disambiguation system UvTWSD1, developed at Tilburg University, for participation in two SemEval-2 tasks: the Cross-Lingual Word Sense Disambiguation task and the Cross-Lingual Lexical Substitution task. The UvT-WSD1 system makes use of k-nearest neighbour classifiers, in the form of single-word experts for each target word to be disambiguated. ...

متن کامل

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures

We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two...

متن کامل

Word Sense Subjectivity for Cross-lingual Lexical Substitution

We explore the relation between word sense subjectivity and cross-lingual lexical substitution, following the intuition that good substitutions will transfer a word’s (contextual) sentiment from the source language into the target language. Experiments on English-Chinese lexical substitution show that taking a word’s subjectivity into account can indeed improve performance. We also show that ju...

متن کامل

A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

Word embeddings are now ubiquitous forms of word representation in natural language processing. There have been applications of word embeddings for monolingual word sense disambiguation (WSD) in English, but few comparisons have been done. This paper attempts to bridge that gap by examining popular embeddings for the task of monolingual English WSD. Our simplified method leads to comparable sta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers and the Humanities

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2000